recd 0 4 APR 2005 




WlPfiaropaisches PCT 



Patentamt 



European 
Patent Office 



Office europeen 
des brevets 



BeschefH^ung Certificate 



Attestation 



Die angehefteten Unterla- 
gen stimmen mit der - 
ursprunglich eingereichten 
Fassung der auf dem nach- 
sten Blatt bezeichneten 
europaischen Patentanrnel- 
dung uberein. 



The attached documents 
are exact copies of the 
European patent application 
described on the following 
page, as originally filed. 



Les documents fixes a 
cette attestation sont 
conformes a la version 
initialement deposee de 
la demande de brevet 
europeen specifiee a la 
page suivante. 



Patentanmeldung Nr. Patent application No. Demande de brevet n° 

04101405.1 / 



PRIORITY 
DOCUMENT 

SUBMITTED OR TRANSMITTED IN 
COMPLIANCE WITH RULE 17.1(a) OR (b) 



Der President des Europaischen Patentamts; 
)m Auftrag 

For the President of the European Patent Office 

Le President de I'Office europeen des brevets 
p.o. 



R C van Dijk 



EPA/EPO/OEB Form 1014.1 - 02.2000 7001014 



Europaisches 
Patentamt 



European 
Patent Office 



Office europeen 
des brevets 



Anmel dung Nr: 
Application no 
Demande no: 



.: 04101405.1 



Anmel detag: 
Date of filing: 
Date de depot: 



05.04.04 



Anmel der/Appl icant( s)/Demandeur( s) : 

Koninklijke Philips Electronics N.V- 
Groenewoudseweg 1 
5621 BA Eindhoven 
PAYS-BAS 



Bezeichnung der Erf indung/Ti tie of the invention/Titre de 1 1 invention: 
(Falls die Bezeichnung der Erfindung nicht angegeben ist, siehe Beschrei bung. 
If no title is shown please refer to the description. 
Si aucun ti tre n'est indiqu£ se ref erer a la description.) 

Mult i- channel parametric audio coding 

In Anspruch genommene Prioriat(en) / Pri ori ty( i es) claimed /Priorit6(s) 
revendiqu6e(s) 

Staat/Tag/Aktenzeichen/State/Date/File no./Pays/Date/Numero de depot: 



Internationale Patentkl assi f i Rati on/International Patent Classification/ 
Classification Internationale des brevets: 

H04N7/64 

Am Anmel detag benannte Vertragstaaten/Contracti ng states designated at date of 
filing/Etats contractants designees lors du depot: 

AT BE BG CH GY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL 
PL PT RO SE SI SK TR LI 



04101405. 1 

EPA/EP0/0EB Form 1014.2 - 01.2000 7Q01014 



2 



PHNL0403 8 8EPQ 

1 

Multi-channel parametric audio coding 



02.04.2004 



This document gives a technical description of a multi-channel parametric 
audio coding system as developed by Philips. The goal of this system is to describe an m- 
channel signal by an n- channel signal, with n<m 9 and parameters describing a spatial image 
in order to reconstruct the m-channel signal. Although the techniques described in this 
5 document could be extended to coding any m to any n channels, the embodiments described 
in this document is to provide a technical description of coding 5(.l) to 2 or 5(.l) to 1 channel 
coding. The extension ".1" denotes the presence of an LFE channel. Furthermore, it is 
assumed that when reproducing the multi-channel signals, a typical loudspeaker setup is used 
consisting of a Left front (Lf), a Right front (Rf), a Centre (Cf), a Left surround (Ls), a Right 
10 surround (Rs) and optionally a low- frequency effects (LFE) speaker. 



HIGH LEVEL DESCRIPTION 

Figure 1 shows a general block diagram of the multi-channel parametric 
encoder. The multi-channel input signal consisting of the five channels Lf, Rf, Cf, Ls, Rs and 

15 the optional LFE channel are analyzed resulting in a set of parameters describing the spatial 
image. Depending on the configuration either a mono down- mix channel M is generated or a 
stereo down mix consisting of the left and right channels Ld and Rd is generated. This mono 
(M) or stereo (Ld, Rd) signal is then encoded using a conventional mono or stereo audio 
encoder respectively. The bit stream resulting from this encoding process is merged with a bit 

20 stream derived from the coded spatial parameters preferably in a backwards compatible 
fashion. 
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Figure 1 : Block diagram of generalized multi-channel parametric encoder. 
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A block diagram of the corresponding multir channel parametric decoder is 
given in Figure 2. First the bit stream is de- multiplexed resulting in a (backwards compatible) 
bit stream for the mono or stereo audio decoder and a spatial parameter bit-stream. The mono 
or stereo decoder then reconstructs the coded mono down- mix signal M' or the stereo down- 
mix signal (Ld',Rd') respectively. Concurrently, the spatial parameters are decoded. Finally 
the multichannel signal is reconstructed by imposing the spatial parameters onto the down- 
mix channel(s). 
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Figure 2: Block diagram of generalized multichannel parametric decoder. 
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DETAILED TECHNICAL DESCRIPTION 
Time/frequency transform 

The multichannel analysis and reconstruction blocks require spatial analysis 
and synthesis to be performed on individual time/frequency tiles. Therefore, a time/frequency 
transform is required with the following prerequisites: 

The transform is preferably complex, to enable measurement and modification 
of (relative) phase values between input and output channels; 

The transform should be oversampled, to avoid aliasing distortion which 
would result from time and frequency dependent changes in a critically- sampled system; 
The frequency resolution should be non-uniform according to the frequency resolution of the 
human auditory system; 

The time resolution is generally rather low, except in the case of transients. 

A generalized block diagram of a spatial encoder is shown in Figure 3. A 
multt channel input signal is first transformed to the frequency domain. Subsequently, a 
downmix and spatial parameters are generated. The downmixed signals are subsequently 
transformed to the time domain. The decoder basically performs the inverse process. 
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Figure 3 : Block diagram of generalized encoder processing stage. 

Currently, two time/frequency transforms are used which meet the 

prerequisites mentioned above. The first transform comprises time- domain segmentation 

followed by FFTs. All input signals are segmented and transformed by means of the Discrete 

Fourier Transform (DFT): 

Nii r 2%kn\ 

Jf y [&] = ^Xiln + l-h]- h a \n\ • expl - j—fi-\> 

with x t [n] the i th time domain input signal, h a [n] the analysis window, / the frame index, h 
the frame update in samples, N the DFT length and X u [k] the DFT with frequency index 
k . At the output of the multi-channel encoder/decoder the processed signals y f [n] are 
transformed back to the time domain by means of an inverse DFT: 



y u W = 2 • K [n] • 9? r w [fc] • exp 



N 



20 



with y;. the (zero-padded) DFT representation, y lV [n] the time- domain segment 
corresponding to frame / and h s [n\ the synthesis window. The resulting output signal v t [n] 
is obtained by overlap- add of the segments y lV M . 

The second transform which is of particular interest for memory and 
computational complexity reasons is a complex- exponential modulated filter bank. 

In the following sections, it is assumed that individual time/frequency tiles of 
all input channels are available for processing at both the encoder and decoder side. 



25 



System based on Stereo dowmnix 
Encoder 

The aim of this encoder is to represent a 5.1-channel input signal as a 
backwards-compatible stereo signal (i.e., with a spatial representation which resembles the 
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5.1 channel reconstruction as good as possible), combined with spatial parameters that enable 
reconstruction of a 5.1-channel output that resembles the original 5.1 input signals from a 
perceptual point of view. The structure of this system is depicted in Fig. 4. 
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5 Figure 4: Block diagram of an encoder providing a stereo downmix. 
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The encoder consists of three parallel stages which all convert a stereo input 
signal to a mono signal, combined with parameter extraction which represents the spatial 
cues between the respective input signals. Each of these three parallel blocks computes 
(assuming input signals Xi and Xz, and output signal Xm): 

The ratio of the powers of corresponding time/frequency tiles of the input 
signals (which will be denoted Tnterchannel Level Difference' , or ILD), given by: 



ILD =101ogl0 



The average phase difference (or the phase difference which maximizes the 
correlation between the input signals), which is referred to as Tnterchannel Phase 
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Difference 9 , or IPD, given by: 



IPD =Z 



^ 



The coherence (ICC), which is the normalized cross-correlation between the 
input signals given the IPD value as in (2): 



ICC 



V * * j 



Each parallel processing block generates a parameter set which comprises all 
or a selected number of these parameters for each time/frequency tile (depending on the 
desired parameter bitrate, some parameters are not transmitted). Besides parameters, each 
10 parallel stage also computes a single output signal, which is a linear (complex) combination 
of the two input signals: 

with wi and \vz complex weights which depend on the extracted parameters (wi = f(IK), IPD, 
ICC)). Preferably each time/frequency tile of has a power that is equal to the sum of the 
15 powers of the input signals Xi and X 2 . 

A fourth parameter which is required for reconstruction of phase differences in 
the encoder is the Overall Phase Difference (or OPD), which is defined as the average phase 
difference between the first input signal and the mono output signal. 

The next stage of the encoder performs a down- mix from three (L, C, R) to 
20 two down- mix channels (Lo, Ro), combined with corresponding spatial parameters. Each 
down-mix channel is a linear combination of the input signals L, R and C: 

L 0 [fc] =a L LOT+a 2 i?[fc] + a 3 afc], 
R 0 [fc] = P L L[fc] + P 2 i?[fc] + P 3 C[fc]. 

The parameters cc z and (3 Z - are chosen such that a good stereo image of the 
stereo signal consisting of L 0 [k] and Rolk] is obtained. One of the prerequisites for a good 
25 stereo image is that a3 equals B 3 . 

At the decoder, channels L, R and C are predicted using the two down- mix 

channels Lo and Ro as follows: 
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Lm = C liL L 0 [k] + C XL R o m, 
R[k] = C hR L 0 [k] + C 2R R 0 [k], 
Cm = C hC L 0 [k] + C 2C R 0 [kl 

To this end, parameters C liZ and C 2 ,z (for Z = L, R or C) are computed at the 
encoder and sent to the decoder. 

A minimal Euclidian norm of the difference of signal Z[£] and its estimation 
Z[k] is used as optimization criterion to find the parameters Ci ;Z and C 2>z . The square of the 
Euclidian norm of the difference of the original input channel Z[fc] and its estimation at the 
decoder Z[k] can be written as: 

k 

with 

mi=C uz L 0 lkl + C 2tZ R 0 [kl 

Minimization of ^|z[£] -Z[£]| 2 leads to the following expressions: 

k 

c _ < L 0 m,z[k] > \R 0 [kf- < R 0 m,zm ><L 0 m,R 0 m > 

ik wiriKwir -i< ^wAw >r 

c _ < R 0 [k],Z[k] > W^jkj- < L 0 [fc],Z[fc] >*< L 0 [klR 0 [k] > 
\\L 0 lkflR 0 [kt -\<L 0 lk],R o m >f 

with 

\Alkf =£|A[fcf, 

k 

< A[k], B{k] >= £ A[k]B*[kl 

k 

For the coefficients C hz and C 2 , z the following relations can be derived: 

p 1 c 2)L + p 2 c 2 , R + p 3 c 2>c =i, 

a x C %L +oc 2 C 2>R +a 3 C 2>c = 0, 

p 1 c 1 , L + p 2 c w + p 3 c 1 , c =o. 

Having 6 variables (Cx, L ,C 2 , L ,Ci, R C 2 , R ,Ci, G and C 2;C ) and at the same time 4 
relations between these parameters, only 2 parameters need to be sent to the decoder. Ci >L 
(henceforth notated by P) and Ci, R (henceforth notated by y) are transmitted to the decoder 
because of their similar statistical distributions. 
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Before the actual downmix is applied (i.e., L, C and R are combined to Lo and 
Ro), the signals L, C and R are preferably pre-conditioned by first applying a phase shift to L, 
R, and/or C to assure that the signal pair L and C on the one hand and R and C on the other 
hand have a non- negative correlation to minimize energy loss by the downmix process. The 
applied phase shifts can be transmitted without any bit-rate costs by superimposing these 
phase shifts on the OPD values resulting from the individual two-to-one stages. 

If no LFE channel is present at the encoder input, the LFE channel can be 
considered as containing zeros only and all related processing steps can be discarded. In that 
case, parameter set 2 as shown in Figure 4 is irrelevant and does not have to be transmitted. 



Decoder 

The decoder basically performs the reverse process as depicted in Figure 4. 
In a first stage, the stereo input signal (Lo, Ro) is converted to a three-channel signal (L, C, 
R) based on parameter set 4. The upmix, based on the transmitted parameters 6 and y, is 
15 performed as follows: 

L[fc]=pA>[fc]+(y-l)7? 0 [fc] 
tf[*] = (P -l)L 0 [A:]+y2? 0 [fc] 
C[k] = (1- P)L 0 [fe] + (l-Y)i? 0 W 

If a 3- channel reconstruction is desired, the decoding process is finished. 

For e.g. a 3.1, 5 or 5.1-channel reconstruction, the mono signals L, C, and R 
are subdivided into two-channel signals, based on the spatial parameters corresponding to 
20 each signal. The general structure of the mono-to-stereo upmix block is given in Figure 5. 

Mo no input X J | Yl ^ 

Stereo output 
► 




Y2 
Parameters 

Figure 5: Generalized structure of the mono -to- stereo upmix stage. 

A local copy of the mono input signal X is fed through a decorrelation filter H. 
25 This filter can be implemented as a (frequency dependent) delay or reverberation module. 

The mono input signal X and its decorrelated copy Q are subsequently combined in a mixing 
stage to form the stereo output signal (Yj, Y 2 ): 
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cos(x) 0 TxikT 
. 0 sm(x)J[eW_ 



T = arctan 



1— a/m^ 



a = —arctan 

2 



2#/CC 



V 

4ICC 2 -4 



JLX = 1 + 



g = 10 IID/2 \ 

If desired, this process is repeated for all three signals L, C, and R, resulting in 
a 5.1 (Lf, Rf, Ls, Rs 5 Cf, LFE) signal. It should be noted that the decollation filter H can be 
different for different mono -to- stereo processing blocks. 



Enhancement methods 

The backwards- compatible stereo downmix will usually result in a 
significantly-reduced spatial image compared to the original 5-channel reconstruction. It is 
possible to enhance the stereo downmix in such a way that the perceived spatial image of the 
downmix resembles the 5-channel reconstruction more closely by introducing virtual 
surround loudspeakers (or sometimes denoted as 'stereo widening' algorithm). The 
generation of virtual surround loudspeakers is based on cross- talk cancellation principles. 
Various methods for stereo- widening are currently available. However, these systems have 
an important drawback: if they are applied on a stereo downmix they result in a widening of 
the complete sound stage instead of a widening of the surround signals only. To overcome 
this drawback, we propose a cross- talk cancellation algorithm that is applied on the stereo 
downmix, in which the amount of cross- talk cancellation depends on the spatial encoding 
parameters. Using this method, (1) only signal parts that would have been reproduced by 
surround loudspeakers in a 5-channel setup are processed, and (2) the cross- talk cancellation 
algorithm can be inverted, which is a requirement for high-quality 5-channel reconstruction. 
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System based on mono-downmix 

Approach 1 

Encoder 

A straightforward approach for a 5.1-to-l encoder is depicted in Figure 6. The 
5 encoder consists of subsequent stereo-to-mono analysis blocks, which are identical to those 
used in Figure 4. 




LFEL analysis & 
^ downmix 

I w Parameter 
Set 3 

Figure 6: Structure of 5.1-to-one encoder. 

10 This approach reduces the number of input channels pair- wise using stereo-to- 

mono blocks. Each block generates a mono signal and spatial parameters, of which a 
selection is transmitted. A recommended parameter selection includes: 

Parameter set 1: 
15 HD, ICC for each time/frequency tile; 

IPD 7 OPD for time/frequency tiles up to about 2 kHz. 

Parameter set 2: 

HD ? ICC for each time/frequency tile; 
20 IPD, OPD for time/frequency tiles up to about 2 kHz. 
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Parameter set 3: 

TTD only for time/frequency tiles up to about 150 Hz. 

Parameter set 4: 
5 IID, ICC for each time/frequency tile; 

Optionally: IPD 5 OPD for time/frequency tiles up to about 2 kHz. 

Parameter set 5: 

IK) for each time/frequency tile. 

10 

Decoder 

A corresponding decoder consists of the reverse process as depicted in Figure 6. The 
individual building blocks are equal to those described in detail in Section 0. 

15 Approach 2 

Instead of performing a spatial analysis and downmix on a pair- wise basis, this approach 
performs the downmix in a single stage based on multidimensional Principal Component 
Analysis (PCA). Assuming 5 complex- valued input signals 3Q, the complex- valued 
covariance matrix Sy of the incoming signals is given by: 

Note that Sy = S/ . In principle the non-diagonal elements of the covariance matrix (i.e., i not 
equal to j) are complex. However, it is possible to extract and transmit the complex angles of 
Sy separately, resulting in a covariance matrix containing elements equal or larger than zero 
only. We will describe a real- valued covariance matrix from this point on. 
25 It is assumed that the input signals Xj consist of a rotation (R) of a set of 

orthogonal signals Yj (i.e., RT 1 = R T ) 
X = RY . 

Given the fact that the individual signals of Yi are orthogonal, the covariance 
matrix of Y, S y , is diagonal. Consequently, the covariance matrix of X, S x , can be written as: 

30 S x =RS y R\ 

Given the diagonal matrix S y , the above expression equals the 
eigenvalue/eigenvector decomposition of S x . This means that the rotation matrix R and the 
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eigenvalues (which are equal to the energies of the orthogonal signals of Y) can be obtained 
by an eigenvalue/eigenvector decomposition of the covariance matrix S x . 

The general encoder approach then consists of: 
computing and applying (relative) phase parameters of the input signals in order to result in a 
5 covaiiance matrix which contains smaller imaginary parts; a possible technique is applying 
complex principal component analysis; 

computing the eigenvalue/eigenvector decomposition of the complex covariance matrix, 
resulting from the original or modified input signals; 
(inverse) rotation of the input signals to obtain the principal components; 
10 transmission of the first principal component only (Le., the component corresponding to the 
largest eigenvalue) as nrano output signal 
transmission of: 

either the rotation matrix R (e.g. in terms of its angular components) and the (relative) 
powers of each principal component, or 
15 the (real- or complex valued) covariance matrix itself in terms of IPDs, correlations and 
power ratios; 

the relative complex phase angles; 

preferably an overall phase shift of the principal component signal. 
The decoder consists of the following steps: 
20 computing the rotation matrix R (either from the direct transmission or an eigenvalue 
decomposition of the transmitted covariance matrix; 

computing of a set of orthogonal signals by means of decorrelating the mono input signal; 
scaling the orthogonal signals by means of the transmitted (relative) signal powers; 
generation of a five-channel output by rotating the orthogonal signals; 
25 re- instating the complex phase relationships based on transmitted (relative, IPD and overall, 
OPD) phase information. 

Approach 3 

The five- dimensional PCA as described as Approach 2 is relatively complex compared to the 
30 cascaded approach described in Approach 1 especially since it is not possible to derive the 
eigenvectors analytically. As a compromise between both approaches the cascaded approach 
could be combined with three-dimensional PCA. This is illustrated in Figure 7. 
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Figure 7 : Hybrid encoder approach. 



The front (F), surround (S) and centre (C) channels are derived similar to 
5 Approach 1. Consequently these signals are used as input to a three-dimensional PCA 
procedure resulting in a 3x3 rotation matrix R. The same procedure for encoding and 
decoding as described in Approach 2 can be used. 

Alternatively, the three channels input to the three-dimensional PCA 
procedure could consist of 
10 Downmix of Lf and Ls 
Downmix of Rf and Rs 
Downmix of Cf and LEE 



Residual-coding extensions 

15 Most encoder processing blocks reduce the number of input channels, 

combined with a parameterization of the relations between the original input channels. In 
principle, each reduction in number of channels results in an output signal, but also in a 
complementary residual signal (or more residual signals), which would in principle allow 
perfect reconstruction of the original input signals. In the decoder, these residual signals are 

20 artificially regenerated by decollation filters. It should be noted, however, that it can occur 
for certain time/frequency tiles that this synthetic residual signal is inappropriate for a 
perceptually high-quality decoder output signal. For these time/frequency tiles, it is possible 
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to encode and transmit the actual residual signal from the encoder, while for remaining 
time/frequency tiles, the decoder can use the synthetic residual signal. The scalability options 
of this approach are of interest; the encoded residual signals can simply be removed from the 
bitstream. In that case, the decoder will default to its synthetic residual signal. This approach 
5 can be used for three- to- one, three- to- two, as well as two -to- one processing blocks. 



Bit-stream syntax 

As already mentioned on page 1, the basic techniques used for multi-channel 
parametric coding can be applied to code different multi-channel signal configurations. This 
10 is reflected in the proposed bit- stream syntax. Some example possibilities: 

encode an Lf, Rf, Ls, Rs (4 channels) multi-channel signal into a mono or stereo-compatible 
signal, 

encode a L(f),C,R(f),LFE (3.1 channels) multichannel signal into a mono or stereo- 
compatible signal or 

15 encode a Lf,C,Rf,Ls,Rs,LFE (5.1 channels) multi-channel signal into a L(f), C, R(f) multi- 
channel compatible signal. 

The bit- stream should be very flexible for decoding subsets of the multi- 
channel signal. Consider the situation where the original signal consisted of a 5.1 multi- 
channel signal encoded into a stereo signal using a similar structure as depicted in Figure 4. 

20 Furthermore, assume that reconstruction is required for a 3.1 setup, consisting of a Left 

(front), Right (front), Centre (front) and an LFE speaker. By only decoding using parameter 
sets 2 and 4 the 3.1 channel signal is already obtained. Hence, it is not necessary to further 
decode the surround channels. 

In the bit- stream syntax the flexibility described above is obtained by 

25 explicitly defining the channel configuration of each encoder/decoder step in the 

mc„channel_config element. In this element it is described which parameters belong to 
which (intermediate) input and (intermediate) output channels (see Table 16). 

Consider the very simple case where the input signal consists of the mono 
signal M and the output signal consists of the mono signal M* and the signal LFE. If the 

30 frequency range of the LFE signal is only limited with respect to the bandwidth of the mono 
signal M, which is typically the case, only for the lower part of the frequency range the 
signals M' and LFE are obtained. For the higher part of the frequency range the signal M' is 
simply obtained as M'=M. A similar situation might occur in a 1 to 3 decoding step. Consider 
the case where the input signal consists of mono signal M and the output signal consists of 
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the left signal L(f), the right signal R(f) and the LFE signal. For the frequency range for 
which the LEE has been included, parameters for all three signals are included, which may 
e.g. consist of ED values between L and R and between L and LFE and coherence values 
between L and R, L and LFE and R and LFE (3 sets!). For the higher frequency range, the 
5 parameters will then only consist of IID values between L and R and coherence values 

between L and R. For this frequency range the decoder operates in a 1 channel to 2 channels 
decoding mode. 

For the elements datalto3() two alternatives are given, the first one uses a 
representation containing the parameters of the covariance matrix, split into power ratios and 
10 coherence values. The second alternative uses an angular representation (e.g. as defined by 
Euler) of the rotation matrix R. 



Table 1 - Syntax of mc_data() 



Syntax 


Num. bits 


Mnemo 
nic 


mc_data(){ 

if (mc_channel_config— 1) { 

mc channel config() 

} 

for (s=0; s<nr__steps; s++) { 
switch(method[s]) { 
case 0: 

datalto2() 
break; 

case 1: 

data2to3() 
break; 

case 2: 

datalto3() 
break; 

case 3: 

datalto5() 
break; 

case 4: 

datalto51() 
break; 

} 

} 

} 


1 


uimsbf 

Note 1 


Note 1 : nr_steps is defined in the mc__channel_config() element. Hence, for the 
first instance of mc_data() mc_channel_config should be set to %1. 



15 
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Table 2 - Syntax of mc_channel_config() 



Syntax 


Nnm 
. bits 


lVTnpmo 

JL V J. llv i. -L 1\J 

nic 


mc_channel_config() { 

nr_downmix_ch_eoded 
nr_output_ch_coded 


1 

3 


Note 1 
Note 2 


nr_steps = 0; 

nr_dec_ch = nr_downmix_ch; 






while (nr_dec_ch < nr„output_ch) { 

for (ch=0; ch<method_in_ch[nr_steps] ; 

cln-+) 1 

input ch[nr steps ? ch] 

} 

L Or I CJL1 — \J 5 lO U.1VJ1J- VJ u. L \^ix ljlijl P Jr ' 

ch++) { 

output_ch[nr_steps ? ch] 

} 

method[nr_steps] 


3 


Note 3 


3 


Note 3 ; 


3 




nr_dec_ch 4-= 
nr_increase_ch [metho d [nr_step s] ] ; 






nr steps += 1 

} 






num_env_default 


2 


Table 15 


Note 1: nr_downmix_ch - nr_downmix_ch_coded + 1 

Note 2: nr_output_ch = iir_downmix__ch + nr_output_ch_coded; 

Note 3: See Table 16. LFE can never be input channel! 
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Table 3 - Syntax of datalto2 



Syntax 



datalto2{ 



if (datalto2_header) { 
if (enable_iid) { 
iid_mode 

} 

if (enable_icc) { 
iccjnode 

} 

if (enable_ipdopd) { 
ipdopd_raode 

} 

freq_res 

if (var_framing) { 

for (e=0; e< num_env; e++) 



env_pos[e] 



} 



for (ch=0; 
ch<method_out_ch[nr_steps]; ch++) { 

if 

(output_ch[nr_steps,ch]=6) { 



} 



nr_bands_coded 



} 



if (enable_iid) { 

for (e=0; e<num_env; e++) { 
iid_dt[e] 

iid_data(e,nr_bands_coded) 

} 

} 

if (enable_icc) { 

for (e=0; e<num_env; e++) { 
icc_dt[e] 

icc_data(e,nr_bands_coded) 

} 

} 



Num. 
bits 



1 
1 

99 
• • 

1 

99 



1 

99 



99 



Mnemonic 



uimsbf 



uimsbf 



uimsbf 



Table 17 

uimsbf 
uimsbf 

uimsbf 



Note 1 



uimsbf 



uimsbf 



PHNL0403 8 8EPQ 



17 02.04. 



if (enablejLpdopd) { 

for (e=0; e<num__env; e++) { 
rod dtTel 

ipd_data(e,nr_ipdopd — coded) 
opd_dt[e] 

opd data(e ? nr ipdopd_coded) 
} 

} 

) 


1 
1 


uimsbff 
uimsbf 


Note 1: In case one of the output channels is the LbkL channel only a part of the 
signal is coded, denoted with nr bands coded. 
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Table 4 - Syntax of dta2to3 



Syntax 





Num 
. bits 


Mnemo 
nic 


if (data2to3Jieader) { 

if (enable Jbeta) { 
beta mode 

} 

if (enable_gamma ) { 
gamma mode 

} 

if (enable_opd) { 
opd mode 

} 

freq_res 


1 
1 

?? 


uimsbf 


1 

?? 


uimsbf 


1 

?? 


uimsbf 


2 




if (var framing) { 

for (e=0; e<num_env_lcr; 


1 

2 


uimsbf 
uimsbf 


env pos lcr[e] 

} 

} else { 

num env lcr - num env frame 
} 

for (ch=0; ch<method_out_ch[nr_steps] ; 


5 


uimsbf 










if (output_ch[nr_steps 5 ch]=6) 






nr_bands__coded 

} 

} 


?? 


Note 1 


} 

if (enable_beta) { 

for (e=0; e<num_env; e++) { 
beta_dt[e] 

beta_data(e,nr_bands_coded) 

) 


1 


uimsbf 


} 

if (enable_gamma) { 

for (e=0; e<num_env; e++) { 
gamma dt[e] 


1 


uimsbf 


gamma data(e,nr bands coded) 
} 

} 







data2to3{ 



e++){ 



ch++) { 
{ 
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if (enable_opd) { 

for (e=0; e<num_env; e++) { 
opd_dt[e] 

opd data(e,nr_opd_coded) 

} 

} 


1 


uimsbf 


Note 1: In case one of the output channels is the LJKH channel only a part of the 
signal is coded, denoted with nr bands coded. 
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Table 5 - Syntax of datalto3 (Alternative A) 



Syntax 



datalto3{ 



if (datalto3_header) { 
if (enable Jid) { 
iid_mode 

} 

if (enable_icc) { 
iccjmode 

} 

if (enable_ipdopd) { 
ipdopdjnode 

} 

freq_res 

if (var_framing) { 

for (e=0; e< num_env; e++) 



env_pos[e] 



} 



for (ch=0; 
ch<method_out_ch[nr_steps]; ch-h-h) { 

if 

(output__ch[nr„steps,ch]==6) { 

nr_Jbands_coded_set2 



} 



} 



} 



if (enable_iid) { 

for (e=0; e<num_env; e++) { 
iid_dt[e] 

iid_data(e,nr_bands__coded) 
iid_dt[e] 

iid_data(e,iir_bands_coded_set2) 
} 

} 

if (enable_icc) { 

for (e=0; e<num_env; e++) { 
icc_dt[e] 

icc_data(e ? nr_bands„coded) 
icc_dt[e] 



Num 
bits 



1 
1 

?? 
1 

?? 
1 

9? 



9? 



Mnemonic 



uimsbf 



uimsbf 



uimsbf 



Table 17 

uimsbf 
uimsbf 

uimsbf 



Note 1 



uimsbf 
uimsbf 



uimsbf 
uimsbf 
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icc_data(e,nr_bands_coded_set2) 
icc_dt[e] 



1 



uimsbf 



icc_data(e,nr_bands_coded_set2) 



if (enable_ipdopd) { 
nr = 

ixdn (nr_b ands„c o ded_se t2 ,nr_ipdop d_eo ded) ; 

for (e=0; e<num_env; e++) { 



ipd_dt[e] 

ipd_data(e,nr_ipdopd_coded) 

ipd_dt [e] 

ipd_data(e,nr) 

opd_dt[e] 

opd_data(e,nr) 



1 



uimsbf 



1 



uimsbf 



1 



uimsbf 



} 



Note 1: In case one of the output channels is the LFE channel only a part of the 
signal is decoded to three channels, denoted with nr_bands_coded_set2. The rest 
of the signal is decoded to two channels, i.e., nr__bands„coded_set2 is set to 
nr_bands_coded. Note that for the bands decoded to three channels there exist two 
sets of IIDs, three sets of ICCs, two sets of IPDs and one set of OPDs. 
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Table 6 - Syntax of datalto3 (Alternative B) 



Syntax 


Num. 
bits 


Mnemonic 


datalto3{ 

if (datalto3_header) { 

if (enable_angles) { 
angle mode 

} 

if (enable_ipdopd) { 
ipdopd mode 

} 

freq_res 


1 
1 

99 
• • 


uimsbf 


l 

?? 


uimsbf 


2 


Table 17 


if (varJEraming) { 

for (e=0; e< num env; e++) 

{ 

env pos[e] 

} 

} 

for (ch=0; 
ch<method out chpor steps]; ch++) { 

if 

(output_ch[nr_steps,ch] — 6) { 

nr bands coded set2 

} 

} 


1 

2 


uimsbf 
uimsbf 


5 


uimsbf 


ft 

• • 


Note 2 


} 

JJL ^dldLDlC aHgieiS/ \ 

for (e=0; e<num_env; e++) { 
angle dt[e] 

^5 — i L, J 


1 


uimsbf 


angle_data(e,nr_bands_coded) 
angle_dt[e] 


1 


uimsbf 


angle data(e,nr bands coded set2) 
} 






} 

if (enable_jpdopd) { 
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nr = 






min(nr_bands_coded_set2,iir_ipdopd_coded); 






for (e— 0; e<num_env; e++) { 






ipddt[e] 


1 


uimsbf 


ind datafe nr iodoDd coded^ 






ipd_dt [e] 


1 


uimsbf 


ipd_data(e,nr) 






opd_dt[e] 


1 


uimsbf 


opd data(e,nr) 

} 

} 

) 






Note 1 : In case one of the output channels is the LFE channel only a part of the 


signal is decoded to three channels, denoted with nr_bands_coded_set2. The rest 


of the signal is decoded to two channels, i.e., nr_bands_coded_set2 is set to 


nr_b ands_co ded. Note that for the bands decoded to three channels there exist two 


sets of IIDs, three sets of ICCs, two sets of IPDs and one set of OPDs. 





The dataltoSQ and datalto51() elements are straightforward extensions of the 
datalto3() element with an increased amount of parameter sets. 



Table 7 - Syntax of iid_da,ta() 



Syntax 


Num. bits 


Mnemonic 


iid data(e, nr iid par) { 

if(iid_dt[e]){ 

for (b=0 ; b<nr_iid_par; b++) { 
iid_par_dt[e][b] = 
sa huff dec(huff iid dt[iid quant], bs codeword); 

} 


99 99 


bslbf 


} 

else { 

for (b=0 ; b<nr_jid_par; b++) { 
iid_par_df[e][b] = 
sa huff dec(huff iid df[iid_quant],bs codeword); 

} 

} 

} 


99 99 


bslbf 
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Table 8 - Syntax of icc_data() 



Syntax 


Num. bits 


Mnemonic 


icc_data(e 5 nrjtecjpar) { 

if (icc_dt[e]) { 

for (b=0 ; b<nr_icc_par; b-h+) { 
icc„par_dt[e][b] = 
sa^ufiLdec (huff __icc__dt 7 bs_code word) ; 

s 




99 99 


bslbf 


} 

else { 

for (b=0 ; b<nr_icc_par; b++) { 
icc_par_df[e][b] = 
sa huff dec(huff ice df,bs codeword); 

} 

} 

} 




99 99 


bslbf 






Table 9 - Syntax of ipd_data() 


Syntax 


Num. bits 


Mnemonic 


ipd data(e, nr ipd par) { 

if (ipd_dt[e]) { 

for (b=0 ; b<nr ipdopd par; b++) 

{ 

ipd_par_dt[e][b] 
sa huff dec(huff ipd dt,bs codeword); 

} 










99 99 


bslbf 


} 

else { 

for (b=0 ; b<nr ipdopd par; b++) 

{ 

ipd_par_df[e][b] = 
sa huff decQiuff ipd df,bs codeword); 

} 

} 

} 










99 99 


bslbf 







5 
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Table 10 - Syntax of opd_data() 



Syntax 


Num. bits 


Mnemonic 


opd_data(e, nr_opd_par) { 

if (opd_dt[e]) { 

for (b=0 ; b<nr_ipdopd_par; 

b++){ 

opd_par_dt[e][b] = 
sa huff dec(huff opd dt,bs_codeword); 

} 


99 99 


bslbf 


} 

else { 

for (b=0 ; b<nr ipdopd„par; 

b++){ 

opd_par_df[e][b] = 
sa huff dec(huff opd df,bs codeword); 

} 

} 

1 


99 99 


bslbf 



Table 11 - Syntax of beta_data() 



Syntax 


Num. bits 


Mnemo 
nic 


beta_data(e, nr_beta_par) { 

if (beta_dt[e]) { 

for (b=0 ; b<nr_beta„par; b++) { 
beta_par_dt[e][b] = 
sa huff dec(huff beta dt ? bs_codeword); 

} 


99 99 


bslbf 


} 

else { 

for (b=0 ; b<nr_beta_par; b++) { 
beta_par_df[e][b] = 
sa huff dec(huff beta_df,bs_codew>rd); 

} 

} 

) 


99 99 


bslbf 



5 



PHNL0403 8 8EPQ 

26 02.04.2004 



Table 12 - Syntax of gamma_data() 



Syntax 


Num. bits 


Mnemonic 


gamma_data(e 5 nr_gamma_paiO { 
if (gamma_dt[e]) { 

for (b=0 ; b<nr_gamma par; 

b++) { 

gamma__par_dt[e][b] = 
sa3uff_dec(huff_gamma dt,bs codeword); 

} 


99 99 


bslbf 




i 

else { 

for (b=0 ; b<nr_gamma par; 

b++) { 

gamma_par_df[e][b] = 
sa_huff_dec(huff_gamma df,bs codeword); 

} 

} 

} 


99 99 


bslbf 


Table 13 - Syntax of angle_data() 


Syntax 


Num. bits 


Mnemonic 


angle_data(e, nr_angle_par) { 

for (a=0 ; a<nr_angles ; a++) 
if (angle_dt[a,e]) { 
for (b=0 ; 

b<nr_angle_p ar ; b++) { 


99 99 


Note 1 
bslbf 


angle_par_dt[a][e][b] = 
sa_huff_dec(huff_gamma dt,bs codeword); 

} 




} 

else { 

for (b=0 ; \ 

b<nr_angle_par; b++) { 


99 99 


bslbf 


angle_par_df [a] [e] [b] = 
sa_huff dec(huff_gamma df,bs codeword); 

} 

} 

} 

} 




Note 1: nr_angles follows from used method! 



5 
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Table 14: Dependencies of variable method 



method 


description 


metliod_iii_xh 


method_out_ch > 


nr_increase_ch 


0 


1 to 2 


1 


Z 


i 


1 


2 to 3 


2 


3 


1 


2 


1 to 3 


1 


3 


2 


3 


1 to 5 


0 (fixed) 


0 (fixed) 


4 


4 


1 to 5.1 


0 (fixed) 


0 (fixed) 


5 


5 


reserved 








6 


reserved 








7 


reserved 









Table 15: num_env as a function of num_env_default 



num_env_default 


num_env ;• 


0 


0 


1 


1 


2 


2 


3 


4 



Table 16: Channel description 



input_ch / 
output_ch 


description 


abbreviation 


0 


mono 


M 


1 


left (front) 


L(f) 


2 


right (front) 


R(f) 


3 


left surround 


Ls 


4 


right surround 


Rs 


5 


centre (or front) 


C(F) 


6 


low frequency effects 


LFE 


7 


surround 


S 



Table 17: nr_bands and nr_bands_coded as a function of freq_res 



freq_res 


nr bands 


nr_bands_coded (per default) 


0 


10 


10 


1 


20 


20 


2 


34 


34 


3 


reserved 
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Annex A Parametric multichannel audio coder with 2, 3, 4 and 5 channel playback 
compatibility 

This invention is a multi-channel extension of the basic principle described in 
WO03/090208-A1. The 5-channel content is downmixed into 2 channels combined with a 
small amount of parametric overhead which enables 5-channel reconstruction at the decoder 
side. Moreover, 2, 3, and 4- channel reproduction are also supported. 

The important features of the proposed coder are: 
transmission of two audio channels (which can be encoded using an arbitrary stereo audio 
codec) and are preferably obtained using principal component analysis on the left- front and 
left-rear pair on the one hand, and using a separate principal component analysis on the right- 
front and right-rear signal pair; 

- transmission of parametric overhead, which includes: 

- inter-channel level differences between left- front and left-rear channels; 

- inter-channel level differences between right- front and right-rear channels; 

- inter-channel coherence values between left- front and left-rear channels; 

- inter- channel coherence values between right-front and right-rear channels; 

the power ratio between the centre channel and the sum of the powers of left-front, 
left-rear, right- front and right-rear channels 

Additionally, the inter- channel phase differences and overall phase differences 
between left- front and left-rear on the one hand, and right- front and right-rear on the other 
hand, may also be included in the parametric bit stream. 

The parameters described above are typically analyzed as a function of time 
and frequency (i.e., for a set of time/frequency tiles). 

Encoder 

Assume a five-channel audio signal Ifin], Z r [n], rfin\, r r [n], c[n], which describe the discrete 
time-domain waveforms for the left-front, left-rear, right-front, right-rear and centre signals, 
respectively. These five signals are segmented using a common segmentation, preferably 
using overlapping analysis windows. Subsequently, each segment is converted to the 
frequency domain using a complex transform (e.g., FFT). However, complex filter-bank, 
structures may also be appropriate to obtain time/frequency tiles. This process results in 
segmented, sub -band representations of the input signals (which will be denoted by Lfik], 
IrM, R$z], Rr[k], and C[k] with k denoting frequency index). 
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As a first step, the relevant parameters between left- front and left-rear are 
estimated. These parameters include the level difference (IID L ), the (average) phase 
difference (IPD L ) and the coherence (ICC£): 

/ XL / [fc]Zr / [fc]^ 



IID L =10 log 10 



IPD L =A 



k 



ICC L = 



\J k k 

Y,L f [k]L* r [k-] 



JY,L f [kiL' f mY,L r [k]L; m 

V k k 



A 



This process is repeated for the right-front and right-rear channels, resulting in 
IID R , IPD r , and ICC R . 

10 The second step consists of a principal component analysis (PCA) of the two 

signals left-front (Lj) and left-rear (L r ). To be more specific, these two input signals are 
rotated in order to obtain a dominant (Y[k]) and a residual signal (Q[kJ), using a rotation 
angle a which maximizes the energy of the dominant signal: 



15 



Y[k]~ 




J2t£]_ 





cosa sinaT Lflk^.expij^-OPDJ) 
- sin a cosaj|_Iv [fc].exp( j(-OPD L + IPDJ) 



20 



Here, the angle OPD L denotes an overall phase rotation angle, while IPD L 
ensures maximum phase-alignment of the two signals L f an& L r . The rotation angle a can be 
derived from the II D L and ICC L parameters following 



a = — arc tan 
2 



with 



gICC L 



§ = 10"°'' 



20 



25 
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The signal Q[k] is subsequently discarded, and signal Y[k] is scaled by a scalar 
fi to obtain L[k] in such a way that L[k] has the same power as the power of Q[k] plus the 
power of Y[k] (i.e., the signal Q[k] is discarded while the loss in signal power is compensated 
for by scaling of Y[k]). It can be shown that the required scale factor fi is equal to: 




This process is also repeated for the right- front and right-rear signal pair, 
resulting in signal R\k\ 

The last step entails mixing the centre signal C[k] in both L[k] and R[k], 
resulting in the stereo output signal pair Lqvt\X\, Rourlk]: 



^OUT Mi 




~L[k] +eC[k] 






_R[k] + £C[k~l_ 



Here, e denotes a weight that determines the strength of Cflfc] in the downmix 
(typically 0.707). A parameter IID C that describes the power of C with respect to the power 
of L and R is extracted: 



IID C =10 log 10 



Y,L[k]rm^R[k]R"m 



The process as described above is repeated for each time/frequency tile. 
Subsequently, the signals L 0 ur[k] and Rourlk] are transformed to the time domain and 
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combined with previous segments using overlap- add, resulting in the output signals Iout\ji\ 
and r 0 uT[ri\. A schematic overview of the encoder is shown in Fig. Al. 



lr 



rf 



Segment 

And 
transform 



Lf 



Lr 



Segment 

And 
transform 



Segment 
And 

rr p transform 



Rj_ 



Rf 



PCA 
rotation 



Parameter to 
PCA angle 



Parameter 



1* 



Parameter 
Setl 



PCA 
rotation 



Mixing 
And 
Parameter 
extraction 



L out 

H 



Inverse 
Transform 
and OLA 



lout 



^Parameter 
Set3 



Parameter to 
PCA angle 



Parameter 
analysis 



1* 



Parameter 
Set2 



Fig. Al. Schematic overview of the encoder 

Decoder for stereo playback 

For stereo playback, the transmitted signals lovriri] and rourln] are reproduced 
10 over the two playback channels, without further processing. 



15 



Decoder for 3-channel playback 

For 3-channel playback, the two received channels /ot/rM and ro UT [ri] are 
segmented and transformed to the frequency domain. Subsequently, the output signals L[k], 
R[k] and C[k] are obtained as follows: 



~im~ 








C[h\ 





W L L OUT 
_j L W LOGOUT + W RC^OUT, 



with 

20 
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k 
k 

c 2 + 10- //Z)c/1 ° 



1 0 Decoder for 5-channel playback 

For five-channel playback, first the 3 -channel playback reconstruction as 
described above is performed, resulting in signals L[k], R{k] and C[£]. The next step entails 
splitting L[k] in Lfck] and L r [£], and splitting R[k] in Rfik] and R r [k]. This splitting process is 
performed using the inverse PCA rotation as used in the encoder. The dominant Y[k] and 

15 residual Q[k] signal, which are required for the inverse PCA rotation, are obtained as follows: 







.£[*]_ 





L[£]cosy 



20 



Here, H[k] denotes an all-pass decorrelation filter to obtain a decorrelated 
version of L[fc\. The angle ? is given by: 



y = arctan 



Subsequently, the signals LfcJc] and L r [k] are obtained using inverse PCA: 
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~L f m~ 




cos a 


L r m_ 




sin a 



cosa _][_ 0 



0 

exp(jOPD L -IPD L \ 



GEM 



This process is then repeated for the right channel. 



5 Decoder for 4-channel playback 

The decoder for 4-channel playback can be simply obtained by first decoding 5 channels (l fi 
/ r , c, r f9 and r s ) 9 followed by mixing of the centre channel (c) in front left and front right: 



Ifiplayhack = lf+ 0.707 C 
1 0 r ffP layback = Tf + 0.707 C 

The factor 0.707 ensures that the total power of the centre channel is constant, 
independent of playback through the single centre speaker or as phantom source created by 
left front and right front. The surround channels remain unchanged (i.e., the signals are the 
15 same as for 5 -channel playback). 
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Annex B Coding the low frequency effects (LFE) channel in parametric multichannel audio 
coding 

The multi-channel audio coder described in on the previous pages, does not 
support the coding of a low frequency effects (LFE) channel, which is incorporated in a 
standard commonly known as 5.1. 

The proposed coder incorporates the LFE channel. To this end additional 
parameters need to be sent to the decoder. Also, the same time, the 2- channel down- mix is 
modified to enable the decoding of the LFE channel. This is done in such a way that the good 
quality of the stereo image of the 2- channel down- mix is preserved. 

The parameters are typically analysed as a function of time and frequency (i.e., 
for a set of time/frequency tiles). The bandwidth of the LFE is typically limited to 
approximately 120 Hz. The proposed solution allows for a variable bandwidth of the LFE 
channel. 

A multichannel coder is described in Annex A. A schematic overview of its 
encoder is shown in Fig. 1. in the following, only the changes required for incorporating the 
LEE channel are explained. 

Encoder 

In Fig. B2, a schematic overview of the encoder incorporating the subwoofer 
channel is shown. 

Assume that c[n] and lfe[n] describe the discrete time-domain waveforms for 
the centre and the LFE signal respectively. The signal Cs and the parameters from the block 
'parameter analysis' are obtained from signals c and Ife in the same way in which the signal L 
and the parameters from the block 'parameter analysis' are obtained from signals and Zr, as 
described in Annex A. There is however a difference. Because of the low- frequency 
behaviour of the signal Ife, this is done only for a limited number of frequency subbands. To 
this end, a parameter describing the number of frequency subbands occupied by the signal Ife 
is incorporated in parameter set 4. In the remaining higher subbands, only the signal C is 
transmitted. The other parameters included in parameter set 4 are the level difference (DP) 
and possibly the phase difference (IPD) between the centre and the LFE channel. If the EPD 
is sent, also the OPD parameter needs to be transmitted. 



PHNL0403 8 8EPQ 



lr 



Segment 

And 
transform 



Segment 

And 
transform 



Segment 

And 
transform 



Rf 



Bjl 



PCA 
rotation 



35 

L 



02.04.2004 



Parameter to 
PCA angle 



Parameter 
analysis 



^Parameter 
Setl 



R 



PCA 
rotation 



Mixing 
And 
Parameter 
extraction 



Lo 



Inverse 
Transform 
and OLA 



^Parameter 
Set 3 



Parameter to 
PCA angle 



Parameter 
analysis 



1* 



Parameter 
Set 2 



Fig. Bl. Schematic overview of the encoder described in Annex A 



Decoder for 5 J -channel playback 

For 5.1-channel playback, the signals C[fc] and LFE\_K] are obtained from the 
reconstructed signal Cs[k] similar to the way in which signals Lf[k] and Lr[k] are obtained 
fromLM as described in Annex A. The values of the not transmitted parameter ICC are set 
to 1. If the IPD and OPD are not transmitted, they are set to 0. 
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Fig. B2. Schematic overview of the encoder including an LEE channel 
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Annex C Reducing the inter-channel interference 

The multi-channel audio coder described in Annexes A and B aims at: 
Approximating the multi-channel audio with two channels and parametric overhead. This is 

done for low bit rate reasons. 
5 Backwards compatibility with 2, 3, 4 and 5-channel reproduction systems. To 

this end a good quality of the stereo image of the 2 transmitted channels is required. 

The multichannel audio coder described in Annexes A and B exhibits a 
significant amount of inter-channel interference, mainly due to the demand for a good stereo 
image of the 2-channel down- mix. The current invention significantly reduces the inter- 

10 channel interference. In this way, the quality of the reproduced multi-channel audio is 
significantly enhanced. 

The coder proposed in this Annex C represents N input channels by 2 down- 
mix channels and parametric overhead. In order to get the best possible reconstruction, in the 
sense of least- square-errors, of the N input channels at the decoder using only 2 channels, 

15 Principal Component Analysis (PCA) should be used. Perfect reconstruction of the N input 
channels at the decoder is only possible if all N channels from PCA are used. Drawback of 
PCA is the fact that no control can be exerted over the 2 down- mix channels. This means that 
the abovementioned requirement regarding the good quality of the stereo image is not met 
when employing PCA. 

20 As in the case of PCA, also in the case of employing 2 down- mix channels 

that do have a good quality of the stereo image, a perfect reconstruction at the decoder is only 
possible when these 2 down-mix channels are extended with an appropriate set of N-2 
channels. As opposed to PCA, whose N channels are orthogonal so that the N-2 discarded 
channels cannot be predicted using the 2 down- mix channels, now the N-2 channels can - to 

25 some extent - be predicted from the 2 down- mix channels. The proposed coder exploits this 
predictability at the decoder. In order to do so, parameters need to be sent to the decoder. 
These parameters are typically analysed as a function of time and frequency (i.e., for a set of 
time/frequency tiles). 

As compared to Annex A or Annex B the following changes are made in the 

30 proposed coder: 

- at the encoder: the parameters required at the decoder have to be computed. 

- in the bit- stream: the parameters required at the decoder have to be included. 

- at the decoder: a parameter dependent up -mix from 2 to 5 channels is performed (in 
Annex A, a fixed up -mix is performed). 
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Encoder 

Assume an iV-channel audio signal, where zi[n] 9 z 2 W, ZnM describe the 
discrete time- domain waveforms of the N channels. These N signals are segmented using a 
common segmentation, preferably using overlapping analysis windows. Subsequently, each 
segment is converted to the frequency domain using a transform (e.g., EFT). However, filter- 
bank structures may also be appropriate to obtain time/frequency tiles. This process results in 
segmented, sub-band representations of the input signals, which will be denoted by Z x [k], 
Z 2 Uc], ... , Z N [k] with k denoting frequency index. 

From these N channels, 2 down- mix channels are created, being Lo[k] and 
i?b[fc], which are also segmented, sub-band representations. Each down- mix channel is a 
linear combination of the N input signals: 

The parameters a z and 0* can be set on the basis of a certain criterion. In ID 695741, this 
criterion is the good stereo image of the steieo signal consisting of Lo[k] and Rotk]. 

Perfect reconstruction of the N input channels at the decoder is only possible 
when these 2 down- mix channels are extended with an appropriate set of N-2 channels. As 
opposed to PCA, whose N channels are orthogonal so that the N-2 discarded channels cannot 
be predicted using the 2 most relevant channels, now the N-2 channels can - to some extent - 
be predicted from the 2 down- mix channels. 

If the N-2 discarded channels are denoted by C 0i i[k], then these channels are 
predicted from the two down- mix channels by: 

For choosing parameters C L . and C 2 i various optimisation criteria are possible. We choose as 
an optimisation criterion the minimal Euclidian norm of the difference of signal C 0 j[k] and its 
estimation C 0J [k] . Parameters C u and C 2J need to be sent to the decoder. 

It can be shown that the parameters C u and C 2J are related to the parameters 
that are obtained when minimising the Euclidian norm of the difference of the original input 
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channel Z f [fc] and its estimation at the decoder Z f [k] . A coder that uses these latter 

parameters is further described. 

The square of the Euclidian norm of the difference of the original input 

channel Z z [/c] and its estimation at the decoder Z.[fc] can be written as: 

5 £|z f .[fc]-z £ M| 2 ? 

k 

with 

Zm = C 1>Z .L 0 M +C 2A R 0 [kl 

Minimisation of ^\z, [k] -Z.-C&j 2 leads to the following expressions: 

10 

< L 0 [&J,Z,.[fc] >* lR 0 [kf-<R o m,Z i m ><L 0 [klR o m > 

CuZt " [kf\R 0 [kf -\< L 0 m,R 0 m >| 2 

< R 0 [kl Zm > |L 0 [fc]|- < L 0 [fc],Z,[fc] >*< L 0 [k], R 0 [k} > 

lL 0 [kf\R 0 [kf -^L^klR.m^ 2 

with 

\\A[kf = £|A[/r]| 2 , 

k 

<A[klBm>=Y,A[k]B*[kl 

k 

15 

For the coefficients Ci,s and C 2 ,zf, the following relations can be derived: 

N 

1=1 

1=1 

20 
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Having N channels, with 2 parameters per channel (Ci, Zz - and C 2 ,zi for the z-th 

channel), but at the same time 4 equations describing relations between these parameters, 2N- 

4 parameters need to be sent to the decoder. 

The process described above is repeated for each time/frequency tile. 

Subsequently, the signals Lo[k] and Ro[k] are transformed to the time domain and combined 

with previous segments using overlap- add, resulting in the output signals kin] and r 0 [ri]. 

Summarising, the encoder sends 2 down- mix channels, h[n\ and r 0 [n], and for 

each time/frequency tile 2N-A parameters, that describe how to retrieve the input channels 

from the 2 down- mix channels, to the decoder. 

Decoder 

At the decoder side, for each time/frequency tile, first the coefficients Ci jZ ? and 
C 2 ,zi are computed for all N channels, using the 2AT-4 coefficients that are transmitted and the 
4 equations describing relations between the coefficients. Then each input channel Z z [fc] is 
approximated by Z t [k] , with 

Zm = C uz L 0 [k] + C 2 ^R 0 lkl 

where LoM and Ro[k] are the received 2 down- mix channels. 

Incorporation of the coder in the multi-channel coder described in Annex A or 

Annex B 

A schematic overview of the encoder of the multi-channel coder described in 
Annex A and Annex B is given in Fig. Al and Fig. B2 respectively. The coder described in 
this Annex C, can be used to replace the block called "Mixing And Parameter extraction", 
that has as inputs the channels L, R and Cs and as outputs the channels Lq and Rq and 
parameter set 3. In order to get a good stereo image of the 2 down- mix channels Lo and 7? 0 , 
they are chosen as: 

L 0 [k] = L[kl+CsUc], 
R 0 [k]=Rlkl+Cs[kl 

Because of the three input channels (hence N = 3), only 2N-4, or 2 parameters 
need to be transmitted to the decoder. It is advantageous to transmit 2 parameters that have 
the same range (e.g. C\, L and C 2 ,/?), so that the same quantisation can be applied to them. 

At the decoder side, for 3 or more channel playback, first all 6 parameters (C itL 
, C 2t L , C\ 9 r , C2J1 C\ t cs and C 2 ,cs) are computed using the 2 transmitted parameters and the 
relations between the 6 parameters. For example, if Ci tL and C 2 ,r are transmitted, then it 
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follows that C 2 ,l=C2,r-1, Ci# = Ci, L -l, C hCs = 1 - Ci,/,and C 2 ,cs= 1 - C 2 ,*. The output 
signals L^k], R[k] and Cs[k~\ are obtained as follows: 

C^ L L 0 m + C 2L R 0 m 

c UR L 0 m+c 2tR R 0 m 











Cs[k] 





5 Playback of 4- or 5-channels is explained in Annex A. 
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Annex D Improved stereo coding 

Traditional coding schemes, like e.g. MPEG-1 Layer m (mp3, [1]) employ 
stereo coding tools to improve the coding efficiency. One of these coding tools is known as 
Mid/Side (M/S) stereo coding or Sum/Difference stereo coding [2]. Using M/S coding a 
stereo signal consisting of a left signal l[ri] and a right signal r[n] is coded as a sum signal 
m[n] and a difference signal s[n] l : 



m[n] = r[n] + /[ft] 
s[ri] = r[n] - /[n] 



For (almost) identical signals /[ft] and r[ri\ this gives a large coding gain as 
the corresponding difference signal s[n] is close to being zero, whereas the sum signal 
contains practically all the signal energy. Hence, in this situation the bit rate required for 
coding the sum and difference signals is close to the bit rate required for coding only a single 
channel. 

Alternatively the mid- side coding process can be described by means of a 
rotation matrix: 



f m[nf 
s[n] 



~c 



cos — 




•sin — 



It is obvious that the left and right signals have been rotated over an angle of 
7T I A . This is illustrated in Figure Dl. The sum signal can be interpreted as a projection of 
the left and right samples onto the line l=r, whereas the difference signal can be interpreted as 
a projection of the left and right samples onto the line l=-r. 



1 Usually the sum and difference are calculated as m[n] = o (/[ft] + r[ri]) and 

sin] = c • (/[n] - r[ft]) . For explanatory reasons, the constant c has been discarded and the 

sign of ,s[ft] has been inverted. 
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In order to obtain a minimum signal power in the residual signal (i.e., 
5 maximum coding gain) for a wide class of input signals, the rotation angle needs to be 

variable. An improved signal mapping, applied in a sub -band coding system using a variable 
rotation angle, is outlined in [3, 4]. The following signal mapping is applied: 



10 



cos(oc) sin (a) Y Z[>z] 
— sin (a ) cos(a )j^r[n] 



where ra'[rc] and s'[ri] represent the dominant and the residual signal respectively and the 
angle a is chosen to minimize tte power of the signal s % [ri] . Due to the unitary rotation 
operation, the power of m'[n] is then maximized. This process is illustrated in Figure D2. 




Using regular M/S coding (as represented by m and s in Figure Dl), the signal 
5 illustrated in Figure D2 would still result in a residual signal with considerable energy. As 
such, the coding gain obtained by M/S coding would be marginal. However, using the 
variable rotation angle a as illustrated above, a very small residual signal can be obtained. 
Obviously, this rotation technique works particularly well when the left signal is 
approximately a scaled version of the right signal. 
10 Both the M/S coding technique (i.e., rotation with a fixed angle) as well as the 

variable rotation technique described above are typically not applied to the broadband signal, 
but rather to signals (or frequency domain representations) representing only a smaller part of 
the full bandwidth of the audio signal, as e.g. described in [3, 4]. 



15 



Although the rotation technique as described in [3,4] eliminates much of the 
disadvantages of M/S coding it is still sub optimal for signals having a strong phase or time 
offset. This is illustrated in Figure D3. 
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Figure D3: Rotation of left and right signals along an angle a . 

The oval- like structure indicates a time or phase delay between 1 and r. 
5 Because the rotation implies a real- valued projection, the residual signal will still contain a : 
significant amount of energy (although usually less than using regular M/S coding). 

In order to further reduce the residual signal energy it is proposed to extend 
the current signal rotation by employing complex- valued phase rotations to the left and right ~ 
10 signal components. From this point, it is assumed that the left and right signals are 

represented by their complex- valued frequency domain representations l[k] and r[k] . One 
method to obtain such signal representation is as follows. First the left and right time domain 
signal segments are windowed: 

15 

r q lri\ = r\n + qHyh{ri\ 

where q represents the frame index (q = 0,1,2,....), H represents the hop- size or update- size 
and n = 0..X-1 where L equals the length of window h[n] . These windowed segments are 
then transformed to the frequency domain by means of a Discrete Fourier Transform (DFT): 



20 
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N-L 



n=0 
N-l 

r[k] = £r,[n]-exp 



f .2%kn 
-J- 



n=0 



N 
.2%kn 



y 



N 



J 



where N represents the DFT length (N> L). Because of symmetry in the DFT only the first 
N/2-hl points are preserved. Furthermore, in order to obtain energy preservation, the first 
DFT points are scaled: 



/[0]s/[0]/2 
r[0] = r[0]/2 



The signal model is extended in the following way: 



'cos(a) sin(a)Y^ 0 Y/[£] 
-sin(a) cos(a)J^ 0 e j{9l ^ z) I r[fc] 



As can be observed from the equation above, the real- valued rotation (using a 
variable rotation angle a is extended with a complex- valued phase modification matrix. The 
angle (p 2 is used to minimize the energy of the residual signal by (phase-) rotating the right 
signal. The common angle (p l can be used to maximize the continuation of the signal over 
frame boundaries. 

After signal mapping/modification the dominant and residual time domain 
signals m[n] and s[n] are obtained by first applying the inverse DFT on the frequency 
domain representations m[k] and s[k] : 



r/i V r/i f. 2%kn\ 

A7=0 \ J 

r?=o y N J 



where the dominant and residual frequency domain representations m[k] and s[k] have been 

zero-padded to length N . The time domain signals are then obtained by means of overlap- 
add: 
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m[n + qH] = m[n + qH] + 29t{m g [n] • h[n]\ 
s\_n + qH] = j[n + + 29?{^ [»] • tfn]} 

Alternatively, complex- modulated filter banks could be employed to obtain a 
5 complex- valued frequency domain representation. 

As an example, the following synthetic signal is mapped by the three different 
methods described above: 

/[n] = 0.5 cos(0.32n + 0.4)+ 0.05 • zjn] + 0.06z 2 [n] 
r[n] = 0.25 cos(0.32n +1.8)+ 0.03 • Zl [ri] + 0.05z 3 M ' 

10 

where zjrc], z 2 [n] and z 3 [n] are independent white noise sequences with unit variance. Part 
of the signals l[n] and r[n] are shown in Figure D4. 
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Figure D4: Left and right signals l[n] and r[n] respectively 

15 

Figure D5, Figure D6 and Figure D7 show the results for the M/S transform, 
the signal rotation over an (optimal) angle a and the rotation over both an (optimal) angle a 
and phase rotation as proposed in this ID respectively. In this particular example the angles 
a , (pi and <p 2 are fixed. In a typical embodiment, these parameters are both time and 
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frequency dependent. In each figure, the top panel represents the dominant signal (m[n]); the 
bottom panel shows the residual signal (s[n]). 

The M/S mapping, as shown in Figure D5, clearly does not increase the coding 
efficiency for this particular situation. As a matter of fact, the residual signal energy, i.e., the 
energy of the signal s[n] , is higher than the energy of the input signal r[n]. 



1 
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Figure D5: Resulting signals after M/S mapping 

The mapping by means of applying the (optimal) signal rotation a as 
illustrated in Figure D6 does also not help for this particular signal. Only a negligible energy 
reduction is obtained. 
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Figure D6: Resulting signals after rotation 

Finally, the results of the extension of the mapping as proposed in this ID are 
shown in Figure D7. Here a clear reduction of residual signal energy is observed. 
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Figure D7: Resulting signals after signal and phase rotation 



A block diagram of an encoder according to the invention is given in Figure 
D8. The left and right frequency domain representations / and r are phase rotated to obtain 
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a maximum coherence (angle (p 2 ) and 

an optimal signal continuation over time (angle <p x ). 

Consequently, the phase-rotated left and right signals are rotated over an angle 
a for maximal reduction of the residual signal error as described above. The parameters a , 
9i and (p 2 are quantized and coded into the bit stream. The dominant signal m and the 
residual signal s can be coded by two independent conventional mono audio coders (or of 
course one dual mono encoder). Additionally, certain parts of the time- frequency plane of the 
signal s, not perceptually contributing to the final output signal, can be discarded in the time- 
frequency (t/f) selector unit. The overall bit stream is formed by merging the bit stream 
corresponding to the dominant signal m, the residual signal s and the parameter bit stream. 
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Figure D8: Block diagram of proposed encoder 



Figure D9 shows a block diagram of the decoder corresponding to the encoder 
of Figure D8. First the bit stream is de- multiplexed into separate bit streams for the dominant 
signal, the residual signal and the parameters. The bit streams for the dominant signal and the 
residual signal are decoded resulting in the signals m' and s 9 . Then the inverse rotation (-a ) 
is applied to obtain preliminary left and right signal representations. Finally the left and right 
signals, V and r 9 respectively, are obtained by applying the inverse phase rotations (-(p L and 
-<P 2 ). 
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Figure D9: Block diagram of proposed decoder 
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The invention can also be advantageously applied in combination with a 
parametric stereo coding system [5]. A general block diagram of a parametric stereo coding 
scheme is given in Figure D10. It is fairly similar to the block diagram of the proposed 
encoder except for the fact that: 

no residual signal is being transmitted and 
- the angle a is not transmitted, but instead an IID value and a coherence value ? are 
transmitted. 

The IDD value represents the Inter- channel Intensity Difference, denoting the 
(frequency and time variant) intensity differences between the left and right input channels. 
The coherence value denotes the coherence, i.e., the similarity, between the left and right 
input channels after phase synchronization. The angle a can be derived from the ED and 
coherence value. 
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Figure D10: Block diagram of parametric stereo encoder 

A corresponding decoder block diagram is shown in Figure D10. It 
corresponds to the block diagram of Figure D9 except for: 

- the residual signal s 9 is now estimated based on the dominant signal m 9 by means of a 
de- correlation process D and 

- the amount of coherence between the left and right output signals is determined by a 
scaling operation. 

The scaling operation basically describes the ratio between the dominant 

signal and the residual signal. 
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Figure Dll: Block diagram of parametric stereo decoder 
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Figure D12 shows a block diagram of the parametric stereo encoder enhanced 
with residual coding. With respect to Figure D10, the only difference resides in the coding of 
(part of) the residual signal s. Which part of the residual signal is coded by Coder 2, is 
determined by the time- frequency (t/f) selector unit. 
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Figure D12: Block diagram of enhanced parametric stereo encoder 

A block diagram of the parametric stereo decoder enhanced with residual 
coding is shown in Figure D13. The bit stream is first de- multiplexed into separate streams 
for the dominant signal, the residual signal and the stereo parameters. The dominant and 
residual signals are then decoded by Decoder 1 and Decoder 2, respectively. Those 
spectral/temporal parts of the residual signal which have been coded are signalled either: 
implicitly, by detecting "empty" areas in the time- frequency plane or 
explicitly, by means of bits in the parameter bit stream. 

This information is applied in the de- correlation unit D and the Combine unit 
to fill the empty time- frequency areas in the decoded residual signal with a synthetic residual 
signal. This synthetic residual signal is generated by using the decoded dominant signal m' 
and the de- correlation unit D. For all other time- frequency areas, the (transmitted) residual 
signal is applied to construct the signal s\ Note that for these areas, no scaling is applied. 
Hence, for these areas it can be advantageous to transmit the angle a in the encoder instead 
of the ED and coherence values as the bit rate required for the single parameter a is smaller 
than the bit rate required for the IID and coherence parameters. However, transmission of a 
instead of IID and coherence values makes the system non-backwards compatible to the 
regular PS system. The subsequent stages of the decoder operate in the same fashion as the 
conventional parametric stereo decoder. 
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Figure D13: Block diagram of enhanced parametric stereo decoder 

The criteria for deciding which time- frequency areas of the residual signal 
need to be coded are perceptually motivated. If (a time- frequency area of) the residual signal 
5 s does not contribute to the audio quality of the final decoder output signal, or if (a time- 
frequency area of) the de- correlated signal forms a perceptually valid representation of the 
(corresponding time- frequency area of the) residual signal, it is not necessary to code the 
residual signal. 

By coding different time- frequency aspects of the residual signal in the 

10 encoder and by multiplexing the corresponding data into a scalable bit stream, the enhanced 
parametric stereo codec can be extended to a bit-rate scalable codec. In a scalable system 
where the layers in the bit stream are dependent, the coded data corresponding to the 
perceptually most relevant time- frequency aspects should be placed in the base layer, and the 
less important data moved to refinement, or enhancement, layers. In this case the base layer 

15 would consist of the dominant bit stream, a first enhancement layer would consist of the 

stereo parameters and a second enhancement layer would consist of the residual bit stream. 

When layers are removed from the scalable bit stream, and information 
regarding the residual signal is thus lost, the enhanced parametric stereo decoder can combine 
the decoded residual signal reconstructed from the data in the remaining layers with the 

20 synthetic residual signal in the manner described above to form a meaningful residual signal. 
Furthermore, if a decoder is not equipped with a second waveform decoder (for the residual 
signal), e.g. due to complexity restrictions, the signal could still be decoded, although this 
would result in a lower quality level. 

Further bit-rate reductions can be obtained by discarding the values for cp L and 

25 cp 2 in the bit stream. In that case, the decoder reconstructs the output signals 1* and r> using a 
phase rotation of zero. This method effectively exploits the lack of sensitivity of the human 
auditory system to high-frequency (inter- aural) phase information. 
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Annex E Post -processing of the individual multi-channel signals in the stereo down-mix 

Many of the multi-channel audio coders as described herein before generate a 
2- channel down- mix to be compatible with 2- channel reproduction systems. When this 
down-mix is created the spatial impression of the original multir channel mix is lost. The 
5 current invention makes improvement of the spatial image of the down- mix possible after 
creation, based upon the parameters as determined in the multi-channel encoder. Also other 
post-processing techniques on the individual multi-channel contributions are made possible. 

The proposed method makes a reconstruction possible of the multi- channel 
mix that is not affected by the post-processing. Also post-processing in the decoder is 
10 possible for stereo playback as a user- selectable, without the necessity to determine the multi- 
channel signal first. 

Without post-processing the down- mix is comparable with the standard ITU 
down- mix. The proposed method however improves the down- mix significantly. This is a 
very important issue, because it is very probabb that the quality of the down- mix will be one 
15 of the selection criteria within MPEG. 

The proposed method is able to determine the contribution in the down- mix of 
the original channels in the multi- channel mix with the help of the determined parameters in 
the encoder. In this way post-processing can be applied to specific channels of the multi- 
20 channel mix (for example: stereo- widening of the rear channels), whilst the other channels 

are not affected. The post-processing does not affect the final multi-channel reconstruction. It 
can also be applied for an improved stereo playback without the necessity to determine the 
multi- channel mix first. 

This method differs from existing post-processing techniques in that it uses the 
25 knowledge of the original multichannel mix (the determined parameters). 

Encoder 

Assume an N- channel audio signal, where zjrc] , z 2 {ri\ , , z N Vri\ describe 

the discrete time-domain waveforms of the N channels. These N signals are segmented using 
30 a common segmentation, preferably using overlapping analysis windows. Subsequently, each 
segment is converted to the frequency domain using a complex transform (e.g., EFT). 
However, complex filter-bank structures may also be appropriate to obtain time/frequency 
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tiles. This process results in segmented, subband representations of the input signals (which 
will be denoted by ZJfc], Z 2 [&],...., Z N [k] with k denoting the frequency index). 

From these N channels, 2 down- mix channels are created, being L Q [k] and 
R 0 [k] . Each down-mix channel is a linear combination of the N input signals: 

5 £ 0 [&] = £a,Z,.[£] 

i=l 

The parameters a. and P ; . are chosen such that the stereo signal consisting of 
L 0 [k~\ and R 0 [k] has a good stereo image. 

10 In the decoder the N input channels are reconstructed as follows: 
Z t lk] = C uz L Q [k] + C 2tZ R 0 Ik] , 

where Z.[£] is an estimate of Z.[fc] . The parameters C l z , and C 2 Zf are determined in the 
encoder and transmitted to the decoder. 



15 




Figure El: The positioning of the post-processing and inverse post-processing block in the 
multi- channel coder. 
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Figure E2: Basic scheme for post-processing of the stereo down- mix. 



10 



On the resulting stereo signal post- processing can be applied in a way that it 
mainly affects the contribution of Z f .[fc] in the stereo mix. In Figure El the position of this 

block in the codec is shown. 

Figure E2 shows how this post-processing block will look. The parameter w t 

determines the amount of post-processing of L 0 [k] and w r of R Q [k] . When w l is equal to 0, 

L D [k] is unaffected, and when w l is equal to 1, L 0 [k] is maximally affected. The same 

holds for w r with respect to R Q [fc] . 



The following equations hold for the post-processing parameters w, and w r 



15 



The blocks H x H 4 in Figure El are filters, which can be various types of 

filters, for example stereo widening filters (as shown in the end of this section). The resulting 
outputs are: 



■ 
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Low 


= H 








. R o_ 



w, +w,H 1 w r H 3 
w,H 2 l-w r + w r H 4 _' 

When the filters H x H 4 are chosen properly, then the matrix H is 

invertible, when the filters are known in the decoder, because the parameters w ( and w r can 

10 be calculated from the transmitted parameters. So the original stereo signal will be available 
again which is necessary for decoding of the multi- channel mix. 

Another possibility is to transmit the original stereo signal and apply the post- 
processing in the decoder to make improved stereo playback possible without the necessity to 
determine the multi- channel mix first. 

15 

Incorporation of the coder in a multi-channel coder described in Annex A, B or C 

On these pages an encoder is described that codes 5-channel audio. The 
following equations are applied: 

20 L 0 [k]=Llk] + Cs[k] 
R 0 lk] = R[k] + Cm, 

in which Cs[k] is the mono signal that results after applying OCS between the LFE- 
(subwoofer) and center-channel. For L[k] and R[k] the following equations hold: 

25 

L[k] = CjCcosCoc,)^ +sin( a z )e JIFD 'L s ) 
R[k] = C r (fios(a r )R f +sm(a r )e JIPD 'R s ), 

where L f is the left/front, L s the left/surround, R f the right/front and R s the right/surround 
channel. The parameters IPD i and IPD r (inter-channel phase differences) and C t and 



with: 



H = 



1- 
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C r (complex scaling parameters) are parameters that are determined in the OCS encoder. The 
angles in these equations can be calculated from the inter-channel intensity differences (IID) 
and the normalized cross-correlation (IC): 



2/C / 10 //P ' /2 ° 



a, = 0.5arctan( /cflP>/10 _ 1 ) 

2/C r 10" D ' /2 ° 
a r = 0.5 arctan( — 75-7^ )• 



In the decoder the following reconstruction is done: 

im = pL o m+(i-y)R 0 m 
cm = a- ^)L 0 [k-\+a-j)R 0 w. 

where L[k] is an estimate of L[k] , R[k] an estimate of R[k] and C[k] an estimate of 
Cs[k] . The parameters P and y are determined in the encoder and transmitted to the 
decoder. 



15 Knowing all this, the functions that are used for the post-processing are: 
w, =/ 1 («/)/ 2 (P) 

w P =/ s (a r )/ 4 (Y), 

here / t / 4 can be any function. For example to apply stereo widening on the rear 

20 channels: 



/ v (a) = / 3 (cc)=sin(a) 

sin(0.57Cp) if 0<|3<1 
1 if P>1 
0 1/ p<o. 



A(P)=/ 4 (P) = 



For the filter functions ii?! the following functions are then chosen (in the z- domain): 



PHNL04038 8EPQ 

60 02.04.2004 



H x (z) = H 4 ( Z ) = 0.8(1 .0 + 0.2^ + 0.2z" 2 ) 
H 2 (z) = H 3 (z) =0.8(~1.0^~ 1 -0.2z" 2 ). 



This invention can be applied in any multi- channel audio -coder that creates a 
compatible stereo down- mix. The invention can be applied in two ways: 
It can be applied before transmission to provide an improved down- mix for decoders that can 
only decode the stereo audio. 

It can also be applied in decoders that can handle the whole bit- stream as a 
user- selectable option to make improved stereo reproduction possible. 
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CLAIMS: 



1. An audio encoder as described hereinbefore. 

2. An audio decoder as described hereinbefore. 
53. An audio encoding method as described hereinbefore. 

4. An audio decoding method as described hereinbefore. 

5. An audio signal as produced by the method of Claim 3. 

10 

6. A storage medium having stored thereon a signal as claimed in Claim 5. 



PHNL0403 8 8EPQ 



62 02.04.2004 

ABSTRACT: 



This document gives a technical description of a multir channel parametric 
audio coding system. The goal of this system is to describe an w -channel signal by an n- 
channel signal, with n<m, and parameters describing a spatial image in order to reconstruct 
the m-channel signal. 



Fig. 1 
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